# Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

This repository accompanies the paper "*Beyond Concept Bottleneck Models: 
How to Make Black Boxes Intervenable?*"

**Abstract**: Recently, interpretable machine learning has re-explored 
concept bottleneck models (CBM), comprising step-by-step prediction of 
the high-level concepts from the raw features and the target variable 
from the predicted concepts. A compelling advantage of this model class 
is the user's ability to intervene on the predicted concept values, 
consequently affecting the model's downstream output. In this work, we introduce 
a method to perform such concept-based interventions on already-trained neural networks,
which are not interpretable by design. Furthermore, we formalise
the model's *intervenability* as a measure of the effectiveness
of concept-based interventions and leverage this definition to fine-tune black-box models. 
Empirically, we explore the intervenability of black-box classifiers on synthetic
tabular and natural image benchmarks.  We demonstrate that fine-tuning improves 
intervention effectiveness and often yields better-calibrated predictions. To showcase 
the practical utility of the proposed techniques, we apply them to deep chest X-ray 
classifiers and show that fine-tuned black boxes can be as intervenable and more 
performant than CBMs. 

### Usage 

All the libraries required are in the conda environment `environment.yml`. 
To install it, follow the instructions below:
```
conda env create -f environment.yml   # install dependencies
conda activate intervenable-models    # activate environment
```

- Scripts `train.py` and `validate.py` can be used to train and validate models
- `intervene.py` implements intervention and fine-tuning procedures
- `models.py` and `losses.py` define models and loss functions
- `networks.py` provides neural network architectures
- `probes.py` contains the utility functions for probing
- `./datasets` contains the data loaders and the processing scripts to generate synthetic data and modified chest X-rays files
- `./configs` contains example configuration files for training black-box and CBM classifiers

Further details are documented in code.
